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ABSTRACT 


In this scenario social media plays a vital role in influencing the life of people. 
Twitter, Facebook, Instagram etc are the major social media platforms. They act 
as a platform for users to raise their opinions on things and events around them. 
Twitter is one such micro blogging site that allows the user to tweet 6000 tweets 
per day each of 280 characters long. Data analyst rely on this data to reach 
conclusion on the events happening around and also to rate a product. But due to 
massive volume of reviews the analysts find it difficult to go through them and 
reach at conclusions. In order to solve this problem we adopt the method of 
sentiment analysis. Sentiment analysis is an approach to classify the sentiment of 
user reviews, documents etc in terms of positive(good), negative(bad), 
neutral(surprise) . I suggest an enhanced twitter sentiment analysis that 
retrieves data based on a baseline in a particular pre defined time span and 
performs sentiment analysis using Textblob . This scheme differs from the 
traditional and existing one which performs sentiment analysis on pre saved 
data by performing sentiment analysis on real time data fetched via Twitter API. 
Thereby providing a much recent and relevant conclusion. 
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I. INTRODUCTION 

In the past few years, there has been a huge growth in the use of micro blogging 
platforms such as Twitter. Spurred by that growth, companies and media 
organizations are increasingly seeking ways to mine Twitter for information 
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about what people think and feel about their products and 
services. Apart from that data analysts also make use of this 
data for interpreting about eminent personalities and 
various events happening. 

The online medium has become a significant way for people 
to express their opinions and with social media, there is an 
abundance of opinion information available. Using sentiment 
analysis the polarity of opinion can be found such as positive, 
negative or neutral by analyzing the text of the opinion . 
Sentiment analysis has been useful for companies to get their 
customer's opinions on their products predicting outcomes 
of elections, and getting opinions from movie reviews. The 
information gained from sentiment analysis is useful for 
companies making future decisions. 

Many traditional approaches in sentiment analysis uses the 
bag of words method. The bag of words technique does not 
consider language morphology, and it could incorrectly 
classify two phrases of having the same meaning because it 
could have the same bag of words. The relationship between 
the collection of words is considered instead of the 
relationship between individual words. When determining 
the overall sentiment, the sentiment of each word is 
determined and combined using a function. Bag of words 
also ignores word order, which leads to phrases with 
negation in them to be incorrectly classified. Other 
techniques discussed in sentiment analysis include Naive 
Bayes, Maximum Entropy, and Support Vector Machines. 


Sentiment analysis refers to the broad area of natural 
language processing which deals with the computational 
study of opinions, sentiments and emotions expressed in 
text. Sentiment Analysis (SA) or Opinion Mining (OM) aims 
at learning people's opinions, attitudes and emotions 
towards an entity. The entity can represent individuals, 
events or topics. An immense amount of research has been 
performed in the area of sentiment analysis. But most of 
them focused on classifying formal and larger pieces of text 
data like reviews. 

With the wide popularity of social networking and micro 
blogging websites and an immense amount of data available 
from these resources, research projects on sentiment 
analysis have witnessed a gradual domain shift. The past few 
years have witnessed a huge growth in the use of micro 
blogging platforms. Popular micro blogging websites like 
Twitter have evolved to become a source of varied 
information. This diversity in the information owes to such 
micro blogs being elevated as platforms where people post 
real time messages about their opinions on a wide variety of 
topics, discuss current affairs and share their experience on 
products and services they use in daily life. Stimulated by the 
growth of micro blogging platforms, organizations are 
exploring ways to mine Twitter for information about how 
people are responding to their products and services. A fair 
amount of research has been carried out on how sentiments 
are expressed in formal text patterns such as product or 
movie reviews and news articles, but how sentiments are 
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expressed given the informal language and message-length 
constraints of micro blogging has been less explored. 

Twitter is an innovative micro blogging service aired in 2006 
with currently more than 550 million users. The user created 
status messages are termed tweets by this service. The 
public timeline of twitter service displays tweets of all users 
worldwide and is an extensive source of real-time 
information. The original concept behind micro blogging was 
to provide personal status updates. But the current scenario 
surprisingly witnesses tweets covering everything under the 
world, ranging from current political affairs to personal 
experiences. Movie reviews, travel experiences, current 
events etc. add to the list. Tweets (and micro blogs in 
general) are different from reviews in their basic structure. 
While reviews are characterized by formal text patterns and 
are summarized thoughts of authors, tweets are more casual 
and restricted to 140 characters of text. Tweets offer 
companies an additional avenue to gather feedback. 
Sentiment analysis to research products, movie reviews etc. 
aid customers in decision making before making a purchase 
or planning for a movie. Enterprises find this area useful to 
research public opinion of their company and products, or to 
analyze customer satisfaction. Organizations utilize this 
information to gather feedback about newly released 
products which supplements in improving further design. 
Different approaches which include machine learning (ML) 
techniques, sentiment lexicons, hybrid approaches etc. have 
been proved useful for sentiment analysis on formal texts. 
But their effectiveness for extracting sentiment in micro 
blogging data will have to be explored. A careful 
investigation of tweets reveals that the 140 character length 
text restricts the vocabulary which imparts the sentiment. 
The hyperlinks often present in these tweets in turn restrict 
the vocabulary size. The varied domains discussed would 
surely impose hurdles for training. The frequency of 
misspellings and slang words in tweets (micro blogs in 
general) is much higher than in other language resources 
which is another hurdle that needs to be overcome. On the 
other way around the tremendous volume of data available 
from micro blogging websites on varied domains are 
incomparable with other data resources available. Micro 
blogging language is characterized by expressive 
punctuations which convey a lot of sentiments. Bold lettered 
phrases, exclamations, question marks, quoted text etc. leave 
scope for sentiment extraction. The proposed work attempts 
a novel approach on twitter data by aggregating an adapted 
polarity lexicon which has learnt from product reviews of 
the domains under consideration, the tweet specific features 
and unigrams to build a classifier model using machine 
learning techniques. 

II. LITERATURE SURVEY 

The related work section covers the other aspects of Twitter 
data usage, with an entirely different approach as discussed 
in the thesis. An analysis of Big Data technologies Info 
Sphere Big Insights and Apache Flume [6] was conducted by 
Birjali et al. Multiple sets of data for various research 
purposes was first collected from Twitter by Apache Flume, 
stored in Hadoop, and then displayed with Big Sheets after 
being ana-lyzed using Info Sphere Big Insights. They chose 
Twitter as their Big Data source, due to the increasingly large 
amount of data generated daily by its users. This method 
uses the Hadoop Distributed File System (HDSF) in order to 
utilize the Map Reduce feature, enabling the collection of 
larger data sets (Tweets). Map Reduce counts the number of 


times a matching data set is iterated and then displays the 
results. Apaches Flume Next Generation (NG) was used to 
collect the Tweets used in this case study. Flume NG uses a 
process that first collects data (Tweets) from multiple 
sources and holds them in memory, and then stores them in 
the HDSF using JAQL script, which is a data processing and 
query language. After a thorough examination of Info Sphere 
Big Insights analytics, a separate data collection tool 
developed from Apache Flume was tested, and the results 
were analyzed using Info Sphere Big Insights. It was 
determined that the technique used by the tool developed 
from Apache Flume was not only superior to older methods, 
but faster as well. A paper on the Intelligent Mining of Public 
Social Networks Influence in Society(MISNIS) tool [7] 
highlights several key limitations on current methods, such 
as Twitter API restrictions and dependency on hashtags and 
keywords for categorization, and demonstrates how MISNIS 
overcomes these limitations, increasing productivity by 80% 
and 40% respectively. MISNIS uses polarity sentiment 
analysis, and does not use a language dependent lexicon. 
While this approach is limiting, it does not negate MISNISs 
apparent superiority, and is open to further development in 
future. Joao P. Carvalho and his collaborators [7] 
demonstrate MISNIS by applying it to track, catalogue, 
analyze, and trace current events in Portugal; however, 
MISNIS canbe applied in many other fields with various 
other research questions. It can collect, store, manage, mine, 
and display data by using Computational Intelligence, 
Information Retrieval, Big Data, Topic Detection, User 
Influence and Sentiment Analysis. This method uses 
geolocation to collect Tweets within Twitter's API restriction 
ofl% data collection, then traces the collected Tweets back 
to the users accounts to collect additional Tweets that meet 
the search criteria from multiple Twitter API accounts. A file 
of every viable user was created and maintained to facilitate 
this process. Mongo DB was used for all data storage, and a 
REST API was used to handle the data once it was collected. 
In addition, the REST API is also the tool used to collect data 
from individual users. This method does not make collection 
limitless, as it is also minimally restricted by Twitter. An 
insightful exploratory analyzer, demonstrates the 
capabilities of Tweets Characterization Methodology 
(TCHARM) [8] to organize collected Tweets based on 
geographical location, the time of the Tweet, as well as its 
contents. TCHARM uses the Text And Spatio-TEmporal 
(TAST) distance measure in order to group similar Tweets 
based on all three categories. This means that TCHARM is 
capable of grouping Tweets about the same, or similar 
subjects, from geographically close, or specified regions, that 
were Tweeted around the same time. The case study 
conducted in this paper to demonstrate TCHARMSs 
performance searched for and categorized Tweets related to 
the 2014 FIFA world cup. Through this study it was 
determined that the TAST feature utilized by TCHARM 
produced a more even distribution of the three factors tested 
for by TCHARM than did other methods. The authors also 
address avenues for future work based on TCHARMs 
limitations. One such limitation is the length of time it takes 
to set the specifications of TCHARMs features. It is also 
suggested that the K-means algorithm used by TCHARM may 
collect too broad a range of Tweets containing the three 
factors for categorization. While this means that some 
collections of Tweets are more loosely related than is 
desirable, it does not affect the overall higher efficiency 
demonstrated by this method. TCHARM can handle a high 
number of Tweets in its data collection due to its use of 
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Apache Spark as its platform, and collects Tweets quickly on 
an hour to hour recurring basis. 

III. live tweet analysis system 

In this system we suppose that a user in general searches for 
tweets related to a particular keyword at current time using 
his twitter credentials, retrieves tweets and finally performs 
sentiment analysis on them so as to reach at a conclusion. 

A. Architecture 

The following system shows the architecture of the proposed 
scheme. The system consists of four modules. 

Creating Twitter API 

In order to retrieve live tweets based on baseline, the user 
should initially request twitter for its authentication 
credentials. 

Tweets Retrieval 

Here tweets are retrieved from the twitter API dynamically 
based on the Keyword name input and given count. 

Preprocessing 

The tweets are imported to a. csv file from the twitter API, 
these tweets consist of unnecessary words, whitespaces, 
hyperlinks and special characters. First we need to do 
filtering process by removing all unnecessary words. 



Figurel. System Architecture 


Sentiment Analysis 

Sentiment analysis is finally done on the pre processed data 

B. Proposed Scheme 

In this method we uses text blob as a method to find the 
polarity of the text (positive text, negative text or neutral 
text). The tweets are imported from the Twitter using the 
(API) provided by the Twitter Developer. From these API 
various fields like tweets, source, retweets, likes, language, 
user etc. can be scrapped. After collecting these data, we can 
analyses the various famous person thoughts on anevent or 
occasion 



Fig 2 Architectural Flow of Twitter Analysis 

The figure 2 explains the extraction of tweets id from twitter 
through API, then preprocess the data that are extracted. 
Preprocessing includes exclusion of unwanted fields, 
segregating the fields important for analysis. Once the fields 


are extracted and segregated CSV is created. Using this CSV, 
the length of the message, Likes, retweets for the id is 
extracted and various results are derived. With the scraped 
tweets, classify the tweets whether positive or negative or 
neutral. 

C. Dataset Description 

In this proposed system, we have used the dataset called 
resultcsv which contains the newly fetched tests Rdata set. 
csv. It contains the following fields Tweets, Len, ID, Date, 
Source, Likes, RT's (Retweets), SA (Sentimental Analysis). 

D. Software Description 

In the system the graphs such as Table, Bar graph, Line 
graph are generated with the help of Jupyter notebook. The 
predefined functions are pandas, numpy, matplotlib, pyplot, 
list, Dictionary. Pandas is used for converting from csv file to 
dataset. Numpy is one of the essential library for scientific 
calculating in Python. It delivers a high-performance 
multidimensional array object, and apparatuses for 
experimenting with these arrays. Python comprises of 
numerous built-in container categories: lists, dictionaries, 
sets, and tuples. A list is the Python equal of an array, but is 
resizable and can contain elements of different types. A 
dictionary stores (key, value) pairs, like a Map in Java or an 
object in JavaScript. Python library such as Text Blobare 
used for processing the textual data. It provides API for 
processing natural language processing (NLP) such as part- 
of-speech tagging, noun phrase extraction, sentiment 
analysis, classification, translation, and more. Tweepy isused 
for accessing Twitter API and it is open sourced. 

E. Data Analysis and Visualization 

> In Twitter users tweets their opinion on an occasion or 
anything including a commodity or even an personality.. 
From their thoughts, importance of that occasion and 
the polarity of their tweet are analysed. Some of the 
analysis with the dataset as follows. 

> Visualize the various source of the tweet. 

> Calculate the polarity of the tweets fetched 

> Visualize the Polarity of tweet (positive, negative, 
neutral) 

> Calculate the general review of the tweeters 

> Calculate the individual review of the tweeters 

> Visualise the tweeters opinion in the form of pie graph 

F. Advantages of Proposed Scheme 

1. The system gives us a review on day to day happenings. 

2. Provides impartial reviews. 

3. Fast analysis 

4. Easily understandable by all. 

IV. EVALUATION 

Our scheme has a few differences from traditional password 
based scheme. The first is the adopting live streaming of 
data. The second is that the output value is tweeters current 
opinion. 

Based on these features, our proposal has advantages as 
follows: 

> Lower computational cost 

> High Accuracy 

> Supporting privacy of users 

The polarity of tweets can be expressed at different levels 
whether the expressed opinions in a document or sentence 
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is either positive or negative. The subjectivity of tweets is 
basically finding of subjective words and text that show the 
presence of opinions. In the result shown inTable2 we can 
see the polarity of each baseline. 

V. RESULT 

This is the sample output for the project for the keyword 
Donald Trump for 1000 tweets. 

Er.t'r Keyword/Tug tc search abdlit: £dnald7r-.un£ 
£rv«r hov fiasy tv«"5 tC sfr: 

Fig3 Searched keyword and number of tweets 
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slang used and the short forms of words. Many analyzers 
don't perform well when the number of classes are 
increased. Also, it's still not tested that how accurate the 
model will be for topics other than the one in consideration. 
Hence sentiment analysis has a very bright scope of 
development in future. 

A. Future Scope 

> We can perform deep sentiment analysis of text, in 
different areas of application. It is not adequate to say 
that a text is an inclusive positive or inclusive negative. 
Users would like to know which separate topics are 
talked about in the text, which of the mare positive and 
which are negative. So, there will be an inclination 
towards greater use of NLP techniques (such as 
syntactic parsing), in addition to machine learning 
methods. 

> A more elaborate web-based application can be made 
for my work in future 

> By using various classification strategies we further 
improve the results 

> By the use of sentiment analysis, I forecast the future 
consequence s or at least anticipate them better, when 
people tweet about present scenario. 
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